AITopics | id image

Collaborating Authors

id image

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

An Empirical Analysis of VLM-based OOD Detection: Mechanisms, Advantages, and Sensitivity

Lee, Yuxiao, Cao, Xiaofeng, Ye, Wei, Yao, Jiangchao, Song, Jingkuan, Shen, Heng Tao

arXiv.org Artificial IntelligenceSep-18-2025

Vision-Language Models (VLMs), such as CLIP, have demonstrated remarkable zero-shot out-of-distribution (OOD) detection capabilities, vital for reliable AI systems. Despite this promising capability, a comprehensive understanding of (1) why they work so effectively, (2) what advantages do they have over single-modal methods, and (3) how is their behavioral robustness -- remains notably incomplete within the research community. This paper presents a systematic empirical analysis of VLM-based OOD detection using in-distribution (ID) and OOD prompts. (1) Mechanisms: We systematically characterize and formalize key operational properties within the VLM embedding space that facilitate zero-shot OOD detection. (2) Advantages: We empirically quantify the superiority of these models over established single-modal approaches, attributing this distinct advantage to the VLM's capacity to leverage rich semantic novelty. (3) Sensitivity: We uncovers a significant and previously under-explored asymmetry in their robustness profile: while exhibiting resilience to common image noise, these VLM-based methods are highly sensitive to prompt phrasing. Our findings contribute a more structured understanding of the strengths and critical vulnerabilities inherent in VLM-based OOD detection, offering crucial, empirically-grounded guidance for developing more robust and reliable future designs.

data mining, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2509.13375

Country: Europe > Switzerland (0.28)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
(5 more...)

Add feedback

FodFoM: Fake Outlier Data by Foundation Models Creates Stronger Visual Out-of-Distribution Detector

Chen, Jiankang, Deng, Ling, Gan, Zhiyong, Zheng, Wei-Shi, Wang, Ruixuan

arXiv.org Artificial IntelligenceNov-22-2024

Out-of-Distribution (OOD) detection is crucial when deploying machine learning models in open-world applications. The core challenge in OOD detection is mitigating the model's overconfidence on OOD data. While recent methods using auxiliary outlier datasets or synthesizing outlier features have shown promising OOD detection performance, they are limited due to costly data collection or simplified assumptions. In this paper, we propose a novel OOD detection framework FodFoM that innovatively combines multiple foundation models to generate two types of challenging fake outlier images for classifier training. The first type is based on BLIP-2's image captioning capability, CLIP's vision-language knowledge, and Stable Diffusion's image generation ability. Jointly utilizing these foundation models constructs fake outlier images which are semantically similar to but different from in-distribution (ID) images. For the second type, GroundingDINO's object detection ability is utilized to help construct pure background images by blurring foreground ID objects in ID images. The proposed framework can be flexibly combined with multiple existing OOD detection methods. Extensive empirical evaluations show that image classifiers with the help of constructed fake images can more accurately differentiate real OOD images from ID ones. New state-of-the-art OOD detection performance is achieved on multiple benchmarks. The code is available at \url{https://github.com/Cverchen/ACMMM2024-FodFoM}.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2412.05293

Country:

North America > Canada > Ontario > Toronto (0.14)
Oceania > Australia > Victoria > Melbourne (0.05)
Asia > China > Guangdong Province > Guangzhou (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (0.34)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(3 more...)

Add feedback

IDA-VLM: Towards Movie Understanding via ID-Aware Large Vision-Language Model

Ji, Yatai, Zhang, Shilong, Wu, Jie, Sun, Peize, Chen, Weifeng, Xiao, Xuefeng, Yang, Sidi, Yang, Yujiu, Luo, Ping

arXiv.org Artificial IntelligenceJul-10-2024

The rapid advancement of Large Vision-Language models (LVLMs) has demonstrated a spectrum of emergent capabilities. Nevertheless, current models only focus on the visual content of a single scenario, while their ability to associate instances across different scenes has not yet been explored, which is essential for understanding complex visual content, such as movies with multiple characters and intricate plots. Towards movie understanding, a critical initial step for LVLMs is to unleash the potential of character identities memory and recognition across multiple visual scenarios. To achieve the goal, we propose visual instruction tuning with ID reference and develop an ID-Aware Large Vision-Language Model, IDA-VLM. Furthermore, our research introduces a novel benchmark MM-ID, to examine LVLMs on instance IDs memory and recognition across four dimensions: matching, location, question-answering, and captioning. Our findings highlight the limitations of existing LVLMs in recognizing and associating instance identities with ID reference. This paper paves the way for future artificial intelligence systems to possess multi-identity visual inputs, thereby facilitating the comprehension of complex visual narratives like movies.

id reference, instruction, test image, (17 more...)

arXiv.org Artificial Intelligence

2407.07577

Country: Asia > China > Hong Kong (0.04)

Genre: Research Report (0.70)

Industry: Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.96)

Add feedback

PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding

Li, Zhen, Cao, Mingdeng, Wang, Xintao, Qi, Zhongang, Cheng, Ming-Ming, Shan, Ying

arXiv.org Artificial IntelligenceDec-7-2023

Recent advances in text-to-image generation have made remarkable progress in synthesizing realistic human photos conditioned on given text prompts. However, existing personalized generation methods cannot simultaneously satisfy the requirements of high efficiency, promising identity (ID) fidelity, and flexible text controllability. In this work, we introduce PhotoMaker, an efficient personalized text-to-image generation method, which mainly encodes an arbitrary number of input ID images into a stack ID embedding for preserving ID information. Such an embedding, serving as a unified ID representation, can not only encapsulate the characteristics of the same input ID comprehensively, but also accommodate the characteristics of different IDs for subsequent integration. This paves the way for more intriguing and practically valuable applications. Besides, to drive the training of our PhotoMaker, we propose an ID-oriented data construction pipeline to assemble the training data. Under the nourishment of the dataset constructed through the proposed pipeline, our PhotoMaker demonstrates better ID preservation ability than test-time fine-tuning based methods, yet provides significant speed improvements, high-quality generation results, strong generalization capabilities, and a wide range of applications. Our project page is available at https://photo-maker.github.io/

class word, id image, photomaker, (14 more...)

arXiv.org Artificial Intelligence

2312.04461

Country:

North America > United States (0.93)
Asia > Middle East > Saudi Arabia > Northern Borders Province > Arar (0.04)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)

Genre: Research Report (1.00)

Industry:

Government > Regional Government > North America Government > United States Government (0.93)
Information Technology (0.68)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

SIO: Synthetic In-Distribution Data Benefits Out-of-Distribution Detection

Zhang, Jingyang, Inkawhich, Nathan, Linderman, Randolph, Luley, Ryan, Chen, Yiran, Li, Hai

arXiv.org Artificial IntelligenceMar-25-2023

Building up reliable Out-of-Distribution (OOD) detectors is challenging, often requiring the use of OOD data during training. In this work, we develop a data-driven approach which is distinct and complementary to existing works: Instead of using external OOD data, we fully exploit the internal in-distribution (ID) training set by utilizing generative models to produce additional synthetic ID images. The classifier is then trained using a novel objective that computes weighted loss on real and synthetic ID samples together. Our training framework, which is termed SIO, serves as a "plug-and-play" technique that is designed to be compatible with existing and future OOD detection algorithms, including the ones that leverage available OOD training data. Our experiments on CIFAR-10, CIFAR-100, and ImageNet variants demonstrate that SIO consistently improves the performance of nearly all state-of-the-art (SOTA) OOD detection algorithms. For instance, on the challenging CIFAR-10 v.s. CIFAR-100 detection problem, SIO improves the average OOD detection AUROC of 18 existing methods from 86.25\% to 89.04\% and achieves a new SOTA of 92.94\% according to the OpenOOD benchmark. Code is available at https://github.com/zjysteven/SIO.

artificial intelligence, detection, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2303.14531

Country: Oceania > New Zealand > South Island > Marlborough District > Blenheim (0.04)

Genre: Research Report > New Finding (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Zero-shot learning: Using text to accurately ID images - Facebook Code

#artificialintelligenceNov-16-2018, 23:47:18 GMT

Zero-shot learning (ZSL) is a process by which a machine learns to recognize objects it has never seen before. Researchers at Facebook have developed a new, more accurate ZSL model that uses neural net architectures called generative adversarial networks (GANs) to read and analyze text articles, and then visually identify the objects they describe. This novel approach to ZSL allows machines to classify objects based on category, and then use that information to identify other similar objects, as opposed to learning each object individually, as other models do. Researchers trained this model, called generative adversarial zero-shot learning (GAZSL), to identify more than 600 classes of birds across two databases containing more than 60,000 images. It was then given web articles and asked to use the information there to identify birds it had not seen before.

large language model, machine learning, natural language, (8 more...)

#artificialintelligence

Technology: